# Custom Stata Functions Reference

This document provides detailed documentation for the custom Stata programs used in the replication package. All functions are located in `dofiles/functions/`.

## Table of Contents

- [Custom Stata Functions Reference](#custom-stata-functions-reference)
  - [Table of Contents](#table-of-contents)
  - [Index Construction](#index-construction)
    - [index\_maker](#index_maker)
    - [group\_mean](#group_mean)
    - [index\_cov\_maker](#index_cov_maker)
  - [Attrition Bounds](#attrition-bounds)
    - [lee\_bounds\_trimming\_ado](#lee_bounds_trimming_ado)
  - [Regression and Table Output](#regression-and-table-output)
    - [multivar\_reg](#multivar_reg)
    - [multivar\_reg\_mat](#multivar_reg_mat)
    - [mv\_reg\_table](#mv_reg_table)
  - [Temporal Heterogeneity Tests](#temporal-heterogeneity-tests)
    - [round\_test\_reg](#round_test_reg)
    - [round\_test\_reg\_NSM](#round_test_reg_nsm)
    - [het\_test\_reg](#het_test_reg)
    - [equation\_test](#equation_test)
  - [Dependencies](#dependencies)
  - [Notes](#notes)

---

## Index Construction

### index_maker

**File:** `dofiles/functions/indexmaker.do` (lines 2-52)

**Purpose:** Creates a standardized index from multiple input variables by standardizing each component and aggregating them.

**Syntax:**

```stata
index_maker, indexname(name) indexlab(string) inputvars(namelist) suffix(name) operation(name) [options]
```

**Required Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `indexname` | name | Name for the output index variable (without suffix) |
| `indexlab` | string | Variable label for the index |
| `inputvars` | namelist | List of input variable stems (without suffix) |
| `suffix` | name | Time period suffix (e.g., `tyav`, `ltav`, `b`) |
| `operation` | name | Aggregation method: `sum` or `mean` |

**Optional Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `samplestd(varname)` | varname | Variable to filter standardization sample |
| `sampleval(numlist)` | numlist | Values of `samplestd` to include in standardization |
| `keepstd(name)` | name | Set to `yes` to keep standardized component variables |
| `recenter` | flag | Re-standardize the final index to mean=0, sd=1 |
| `imputeover(varname)` | varname | Group variable for median imputation of missing values |
| `rescale(namelist)` | namelist | Variables to flip sign (so higher = better) |

**Example:**

```stata
* Create antisocial behavior index averaged over 10-year rounds
index_maker, indexname(fam_asb) indexlab("Antisocial Behavior Index") ///
    inputvars(drugssellever stealnb disputes_all carryweapon arrested asbhostil domabuse) ///
    suffix(tyav) operation(mean) ///
    samplestd(control) sampleval(1) ///
    recenter ///
    rescale(drugssellever stealnb)
```

**Output:** Creates variable `fam_asb_tyav` with label "Antisocial Behavior Index"

---

### group_mean

**File:** `dofiles/functions/indexmaker.do` (lines 55-71)

**Purpose:** Creates wave-level averages of variables across multiple survey rounds.

**Syntax:**

```stata
group_mean, inputvars(namelist) suffixout(name) suffixin(name) id(varname) groupvar(varname) groups(numlist)
```

**Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `inputvars` | namelist | Variable stems to average |
| `suffixout` | name | Suffix for output variables |
| `suffixin` | name | Suffix of input variables |
| `id` | varname | Panel identifier (e.g., `partid`) |
| `groupvar` | varname | Variable identifying survey rounds |
| `groups` | numlist | Round numbers to include in average |

**Example:**

```stata
* Create 10-year average (rounds 7 and 8)
group_mean, inputvars(income consumption) suffixout(tyav) suffixin(e) ///
    id(partid) groupvar(round) groups(7 8)
```

**Output:** Creates `income_tyav` and `consumption_tyav` as averages across rounds 7 and 8.

---

### index_cov_maker

**File:** `dofiles/functions/indexmaker.do` (lines 80-167)

**Purpose:** Creates an index using inverse-covariance (GLS) weighting, following Anderson (2008) and Kling, Liebman, and Katz (2007).

**Syntax:**

```stata
index_cov_maker, indexname(name) indexlab(string) inputvars(namelist) suffix(name) operation(name) [options]
```

**Parameters:** Same as `index_maker`, except no `keepstd` option.

**Methodology:**

1. Standardize each component to z-scores
2. Calculate the covariance matrix of standardized components
3. Invert the covariance matrix
4. Weight each component by its row sum from the inverted matrix
5. Sum weighted components and divide by total weight
6. Optionally recenter to mean=0, sd=1

**Example:**

```stata
index_cov_maker, indexname(fam_asb_cov) indexlab("ASB Index (Cov-weighted)") ///
    inputvars(drugssellever stealnb disputes_all carryweapon) ///
    suffix(tyav) operation(mean) recenter
```

**Output:** Creates `fam_asb_cov_cov_tyav` with covariance-weighted index.

---

## Attrition Bounds

### lee_bounds_trimming_ado

**File:** `dofiles/functions/attrition.do` (lines 2-131)

**Purpose:** Implements Lee (2009) bounds for attrition by trimming observations from treatment arms with higher response rates.

**Syntax:**

```stata
lee_bounds_trimming_ado varlist, treatments(varlist) controls(varlist) attritvar(varname) direction(name) [options]
```

**Required Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `varlist` | varname | Outcome variable (max 1) |
| `treatments` | varlist | Treatment indicator variables |
| `controls` | varlist | Control group indicator (max 1) |
| `attritvar` | varname | Attrition indicator (1=attrited, 0=found) |
| `direction` | name | Effect direction: `plus` (positive effect expected) or `minus` |

**Optional Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `condcontrols` | varlist | Conditioning controls for residualization |
| `condfe` | varlist | Fixed effects for conditioning |
| `roundvar` | varname | Survey round variable |
| `round` | numlist | Specific round to analyze (max 1) |

**How It Works:**

1. Calculate attrition rates for each treatment arm
2. Identify the arm with the lowest response rate (baseline)
3. For other arms, calculate how many observations to trim to equalize response rates
4. Create upper bound: trim highest outcome values from arms with higher response
5. Create lower bound: trim lowest outcome values from arms with higher response
6. The sign of trimming depends on the `direction` parameter

**Example:**

```stata
* Create Lee bounds for antisocial behavior (expected negative treatment effect)
lee_bounds_trimming_ado fam_asb_tyav, ///
    treatments(tpassonly cashassonly tpcashass) ///
    controls(control) ///
    attritvar(attrited) ///
    direction(minus) ///
    roundvar(round) round(7)
```

**Output:** Creates:

- `fam_asb_tyav_t_ub` - Upper bound (trimmed) version
- `fam_asb_tyav_t_lb` - Lower bound (trimmed) version

---

## Regression and Table Output

### multivar_reg

**File:** `dofiles/functions/multivariate_reg.do` (lines 11-174)

**Purpose:** Runs multivariate regressions with multiple outcomes and exports formatted LaTeX tables.

**Syntax:**

```stata
multivar_reg [if], outcomevar(varlist) controls(varlist) specnames(string) filename(name) [options]
```

**Required Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `outcomevar` | varlist | Outcome variables to regress |
| `controls` | varlist | Control/independent variables |
| `specnames` | string | Column header names for output table |
| `filename` | name | Output filename (saved to `outfiles/tables/`) |

**Optional Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `cluster` | varname | Cluster variable for standard errors |
| `fixedeffects` | varlist | Fixed effects to absorb |
| `standarized` | flag | Standardize all variables before regression |
| `stars` | flag | Include significance stars in output |
| `supraheader` | string | Super-column headers |
| `supracount` | numlist | Number of columns per super-header |

**Example:**

```stata
multivar_reg, outcomevar(fam_asb fam_econ timepref) ///
    controls(tpassonly cashassonly tpcashass $base) ///
    specnames("ASB" "Economic" "Time Pref") ///
    filename(Table1_multivar) ///
    fixedeffects(tp_strata_alt cg_strata) ///
    stars
```

**Output:** LaTeX table saved to `outfiles/tables/Table1_multivar.tex`

---

### multivar_reg_mat

**File:** `dofiles/functions/multivariate_reg.do` (lines 187-282)

**Purpose:** Runs a single multivariate regression and stores results in matrices for later use with `mv_reg_table`.

**Syntax:**

```stata
multivar_reg_mat [if], outcomevar(varlist) controls(varlist) matname(name) [options]
```

**Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `outcomevar` | varlist | Single outcome variable |
| `controls` | varlist | Control/independent variables |
| `matname` | name | Name prefix for output matrices |
| `cluster` | varname | Cluster variable |
| `fixedeffects` | varlist | Fixed effects |
| `standarized` | flag | Standardize variables |

**Output Matrices:**

- `{matname}` - Coefficient, SE, p-value matrix
- `{matname}sum` - Summary statistics (mean, sd) for controls
- `{matname}sta` - Regression statistics (N, clusters, R2, F-stat, etc.)

**Example:**

```stata
multivar_reg_mat, outcomevar(fam_asb_tyav) controls(tpassonly cashassonly tpcashass $base) ///
    matname(asb_10y) fixedeffects(tp_strata_alt cg_strata)

* Access results
matrix list asb_10y
matrix list asb_10ysta
```

---

### mv_reg_table

**File:** `dofiles/functions/multivariate_reg.do` (lines 292-532)

**Purpose:** Combines multiple regression result matrices into a formatted LaTeX table.

**Syntax:**

```stata
mv_reg_table, resultsmat(namelist) rowvars(varlist) specnames(string) filename(name) [options]
```

**Required Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `resultsmat` | namelist | Names of result matrices from `multivar_reg_mat` |
| `rowvars` | varlist | Variables to include as rows |
| `specnames` | string | Column specification names |
| `filename` | name | Output filename |

**Optional Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `stats` | namelist | Statistics to include (N, N_clust, r2_a, ll, F, dvmean) |
| `stars` | flag | Include significance stars |
| `feind`, `felab` | varlist, string | Fixed effects indicator row |
| `clusterind`, `clusterlab` | varlist, string | Cluster indicator row |
| `omittedind`, `omittedlab` | varlist, string | Omitted variables indicator |
| `novalues` | name | Set to `yes` to exclude p-values |
| `supraheader`, `supracount`, `supralenght` | options | Super-column formatting |

**Example:**

```stata
* First run individual regressions
multivar_reg_mat, outcomevar(fam_asb_tyav) controls($base) matname(spec1)
multivar_reg_mat, outcomevar(fam_asb_tyav) controls($base_small) matname(spec2)

* Combine into table
mv_reg_table, resultsmat(spec1 spec2) rowvars(tpassonly cashassonly tpcashass) ///
    specnames("Full Controls" "Limited Controls") ///
    filename(robustness_table) stats(N r2_a) stars
```

---

## Temporal Heterogeneity Tests

### round_test_reg

**File:** `dofiles/functions/reg_tests.do` (lines 141-307)

**Purpose:** Tests for differential treatment effects across survey waves (temporal heterogeneity).

**Syntax:**

```stata
round_test_reg [if], outcomevars(namelist) treatvars(varlist) partid(varname) rounds(numlist) wavenames(namelist) roundvar(varname) wavevar(varname) filename(name) [options]
```

**Required Parameters:**

| Parameter | Type | Description |
| ----------- | ------ | ------------- |
| `outcomevars` | namelist | Outcome variable stems |
| `treatvars` | varlist | Treatment indicators |
| `partid` | varname | Panel identifier |
| `rounds` | numlist | Survey round numbers |
| `wavenames` | namelist | Wave suffixes (e.g., `ltav tyav`) |
| `roundvar` | varname | Variable identifying rounds |
| `wavevar` | varname | Wave indicator (0/1) |
| `filename` | name | Output filename |

**Optional Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `controls` | varlist | Control variables |
| `fixedeffects` | varlist | Fixed effects |
| `clusterse` | varlist | Cluster variable |
| `repooled` | flag | Create pooled outcome from wave-specific variables |

**How It Works:**

1. Creates treatment x wave interactions
2. Runs separate regressions for each wave
3. Runs pooled regression with interactions
4. Tests equality of coefficients across waves
5. Reports wave-specific effects and difference p-values

**Example:**

```stata
round_test_reg, outcomevars(fam_asb fam_econ) ///
    treatvars(tpassonly cashassonly tpcashass) ///
    partid(partid) rounds(5 6 7 8) wavenames(ltav tyav) ///
    roundvar(round) wavevar(wave10yr) ///
    filename(Table2_LTvs10Y) ///
    controls($base) fixedeffects(tp_strata_alt cg_strata) ///
    repooled
```

---

### round_test_reg_NSM

**File:** `dofiles/functions/reg_tests.do` (lines 313-485)

**Purpose:** Similar to `round_test_reg` but with different output format (No Sample Mean columns). Adds 6 empty columns for visual spacing.

**Additional Parameter:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `col_0_title` | string | Title for first column |

---

### het_test_reg

**File:** `dofiles/functions/reg_tests.do` (lines 497-592)

**Purpose:** Tests for heterogeneous treatment effects across waves using interaction terms.

**Methodology:**

1. Creates treatment x wave interactions
2. Runs single pooled regression with main effects and interactions
3. Reports coefficients for: wave indicator, treatment main effects, treatment x wave interactions

---

### equation_test

**File:** `dofiles/functions/reg_tests.do` (lines 64-132)

**Purpose:** Tests equality of treatment effects across time periods using `suest` (seemingly unrelated estimation).

**Methodology:**

1. Runs separate regressions for each time period
2. Combines estimates using `suest`
3. Tests coefficient equality across equations
4. Reports 1-year effects, 10-year effects, and p-values for differences

**Example:**

```stata
equation_test, outcomevars(fam_asb fam_econ) ///
    filename(Table_equality_test) ///
    controls($base)
```

---

## Dependencies

All functions require the following packages (installed via `STYL_10Yrep_packages.do`):

- **reghdfe**: High-dimensional fixed effects regression
- **frmttable**: Table formatting (part of `outreg`)
- **ftools**: Fast data manipulation

## Notes

- All output tables are saved to `outfiles/tables/` in LaTeX fragment format
- Matrix results use naming convention: `{matname}`, `{matname}sum`, `{matname}sta`
- Global macros `${matname}fe` and `${matname}se` store fixed effects and cluster specifications
